Code to run analysis and generate figures for McNeall et al. (2023) “Constraining the carbon cycle in JULES-ES-1.0”
What are the data?
Failure analysis

Failure analysis marginal plots
Are there hard thresholds after which the model always fails? (Question from review comments). Run failures are in t
null device
1
## Is there a pattern to the wave01 timeseries outliers?
null device
1
Visualising the ensemble range
It’s important to remember that the design of the experiment is multiplication factors of the original parameters. This might be important for the “hold” value in a sensitivity analysis, as the “standard” value and the median value of the ensemble will not be the same.

Wave00/Wave01 Ensemble behaviour in key (constraining) outputs.
Global mean for the 20 years at the end of the 20th Century. There is still a significant low bias on cVeg output.

What proportion of wave01 fall within Andy Wiltshire’s constraints?
Just under a third. Points at a significant model discrepency in cVeg
Of the 400 members of the wave01 ensemble, 128 pass Andy Wiltshire’s Level 2 constraints.
[1] 128
[1] 0.32
null device
1
null device
1
Exploring summary of constraint: Cumulative NBP
Carbon budget data
Section 2.5 in Friedlingstein et al. describes how the land carbon sink is estimated.
IPCC AR6 Chapter 5 states: “The net land carbon sink is taken as net biome productivity (NBP) and so includes any modelled net land-use change emissions. Further details on data sources and processing are available in the chapter data table (Table 5.SM.6).”
The GCP says: “The land sink is the average of several dynamic global vegetation models that reproduce the observed mean total land sink of the 1990s.”
AR6 Chapter 5 states: "The land carbon cycle components of historical ESM simulations show a larger range, with simulated cumulative land carbon uptake (1850–2014) spanning the range from –47 to +21 GtC, compared to the GCP estimate of –12 ± 50 GtC (Figure 5.23b).




How much are all select output constrained?

Constraint of output at level 2
Proportion of the initial range of model output that is covered by level 2 constrained ensemble (%):
npp nbp cSoil cVeg lai_lnd_mean rh_lnd_sum
26.6 55.1 48.3 23.5 60.9 26.0
fLuc_lnd_sum fHarvest_lnd_sum treeFrac_lnd_mean shrubFrac_lnd_mean baresoilFrac_lnd_mean c3PftFrac_lnd_mean
51.5 60.7 88.3 85.4 52.9 58.0
c4PftFrac_lnd_mean
43.9
null device
1
Pairs plot of 2d projections of constrained output space.

Constraining to level 2 with the emulator

---
title: "Constraining the carbon cycle in JULES-ES-1.0"
output:
  html_document:
    toc: yes
    toc_depth: '2'
    df_print: paged
  html_notebook:
    toc: yes
    toc_float: yes
    toc_depth: 2
    number_sections: yes
---


Code to run analysis and generate figures for McNeall et al. (2023) "Constraining the carbon cycle in JULES-ES-1.0"

```{r, echo = FALSE, message = FALSE, warning=FALSE, results = 'hide'}
# Load helper functions

knitr::opts_chunk$set(fig.path = "figs/", echo = FALSE, message = FALSE, warnings = FALSE)


```

```{r}

source("JULES-ES-1p0-common-packages.R")
source("JULES-ES-1p0-common-functions.R")
source("JULES-ES-1p0-common-data.R")

```

## What are the data?
```{r}
getStandardMemberLongName <- function(ensloc, variable){
  # Helper function to get the long name of output variables (and the units)
  
  ensmember <- 'S3'
  fn <- paste0(ensloc,'JULES-ES-1p0_',ensmember,'_Annual_global.nc')

  nc <- nc_open(paste0(fn))
  ln <- nc$var[[variable]]$longname
  un <- nc$var[[variable]]$units      
  
  return(list(longname = ln, units = un))
  
}

# for(i in y_names_select){
#   
#   out <- getStandardMemberLongName(ensloc_wave00, i)
#   print(i)
#   print(out)
# }

```



## Failure analysis

```{r failure-pairs, fig.width=12, fig.height=12, fig.path='figs/', dev=c('png', 'pdf')}


low_npp_ix <- which(Y[,'npp_nlim_lnd_sum'] < 1e5)
# code from https://stackoverflow.com/questions/28182872/how-to-use-different-sets-of-data-in-lower-and-upper-panel-of-pairs-function-in


#X <- matrix(runif(300), ncol=3)
#Y <- matrix(c(sort(runif(100, 0, 10)), 
#              sort(runif(100, 0, 10)), 
#              sort(runif(100, 0, 10))), ncol=3)

#pdf(file = 'figs/fig02.pdf', width = 12, height = 10)
#pdf(file = 'figs/run-failure-pairs.pdf', width = 12, height = 10)
x1 <- X[low_npp_ix, ]
x2 <- X_nlevel0

XY <- rbind(x1, x2)


pairs(XY,
      lower.panel=function(x, y, ...) {
        Xx <- x[seq_len(nrow(x1))] # corresponds to X subset
        Xy <- y[seq_len(nrow(x1))] # corresponds to X subset
        #usr <- par("usr"); on.exit(par(usr))
        #par(usr = c(range(x1[, -ncol(x1)]), range(x1[, -1]))) # set up limits
        points(Xx, Xy, col = zblue, pch = 19, cex = 0.8)
       # if(par('mfg')[2] == 1) axis(2) # if left plot, add left axis
        #if(par('mfg')[1] == ncol(x1)) axis(1) # if bottom plot add bottom axis
      }, 
      upper.panel=function(x, y, ...) {
        Yx <- x[(nrow(x1) + 1):length(x)] # Y subset
        Yy <- y[(nrow(x1) + 1):length(y)] # Y subset
        
        #cntr <- outer(Yx, Yx, FUN='*') # arbitrary function for contour
       # usr <- par("usr"); on.exit(par(usr))
        #par(usr = c(range(x2[, -1]), range(x2[, -ncol(x2)]))) # set up limits
        points(Yx, Yy, col = zred, pch = 19, cex = 0.8)
        #contour(Yx, Yy, cntr, add=TRUE)
        #if(par('mfg')[2] == ncol(x2)) axis(4) # if right plot, add right axis
        #if(par('mfg')[1] == 1) axis(3) # if top plot, add top axis
      }, 
      #tick=FALSE, # suppress the default tick marks
      #line=1,
      gap = 0,
      xlim = c(0,1), ylim = c(0,1),
      labels = 1:d,
      oma = c(2, 18, 2, 2)) # move the default tick labels off the plot 

reset()

legend('left', legend = paste(1:d, colnames(lhs)), cex = 1.1, bty = 'n')
legend('topleft', pch = 19, col = c( zred, zblue), legend = c('failed', 'zero carbon cycle'), bty = 'n', inset = 0.02, cex = 1.1 )

#dev.off()

```


## Failure analysis marginal plots 

Are there hard thresholds after which the model always fails? (Question from review comments).
Run failures are in t

```{r, fig.width = 10, fig.height = 10}


y_level0 <- Y_level0[,'npp_nlim_lnd_sum']

pdf(file = 'figs/marginal_failures.pdf', width = 10, height = 10)
par(mfrow = c(5,7), mar = c(3,1,3,1), oma = c(1,1,5,1))
for(i in 1:p){
  plot(X_level0[,i], y_level0, xlab = '', ylab = '', main = colnames(X_level0)[i], xlim = c(0,1))
  points(X_nlevel0[,i], rep(0, nrow(X_nlevel0)), col = zred, pch = 19)
  points(x1[, i ], rep(0,nrow(x1)), col = zblue, pch = 19)
  
}
mtext('NPP', outer = TRUE, side = 3, cex = 2, line = 2)
reset()
legend('bottomright', pch = 19, col = c('black', zred, zblue), legend = c('ran','failed', 'zero carbon cycle'), bty = 'n', inset = 0.05, cex = 1.1 )
dev.off()

```

 ## Is there a pattern to the wave01 timeseries outliers?
```{r, fig.width= 12, fig.height = 12}

pdf(file = 'figs/X_wave01_timeseries_outliers.pdf', width = 12, height = 12)
pairs(X_wave01_train[ts_outliers_ix_wave01, ],
      gap = 0, 
      xlim = c(0,1),
      ylim = c(0,1)
)
dev.off()
      
      


```




# Visualising the ensemble range

It's important to remember that the design of the experiment is multiplication factors of the original parameters. This might be important for the "hold" value in a sensitivity analysis, as the "standard" value and the median value of the ensemble will not be the same.

```{r, fig.width = 6, fig.height = 8}

#pdf(file = 'figs/lhs_range.pdf', width = 6, height = 8)
#pdf(file = 'figs/fig01.pdf', width = 6, height = 8)
par(las = 1, mar = c(5,8,2,1))
lhs_min <- apply(lhs_wave0_wave01_all, 2, min)
lhs_max <- apply(lhs_wave0_wave01_all,2, max)

plot(lhs_max, 1:d, type = 'n', xlim = c(0,10), axes = FALSE, xlab = 'multiplying factor', ylab = '')

abline(v = 0, lty = 'dashed', col = 'grey')
abline(v = 1, lty = 'dashed', col = 'tomato2')

segments(x0 = lhs_min, y0 = 1:d, x1 = lhs_max, y1 = 1:d )
points(lhs_min, 1:d, pch = 20)
points(lhs_max, 1:d, pch = 20)
axis(2, at = 1:d, labels = colnames(lhs))
axis(1)
#dev.off()

```

## Wave00/Wave01  Ensemble behaviour in key (constraining) outputs. 

Global mean for the 20 years at the end of the 20th Century. There is still a significant low bias on cVeg output.

```{r, fig.width = 8, fig.height = 8}
wave00col <- 'skyblue2'
wave01col <- 'tomato2'

wave00col <- 'dodgerblue2'
wave01col <- 'firebrick'
rangecol <- 'grey'
```


```{r, fig.width = 8, fig.height = 8 }
# Histogram of level 1 constraints
hcol = 'darkgrey'
lcol = 'black'

#pdf(file = 'figs/fig05.pdf', width = 8, height = 8)
#pdf(file = 'figs/level_2_constraints_hists.pdf', width = 8, height = 8)
par(mfrow = c(2,2), fg = 'darkgrey', las = 1, oma = c(0.1, 0.1, 4, 0.1))

trunc <- function(x, vec){
  
  dat <- x[x < max(vec) & x > min(vec)  ]
  
  dat
  
}


h <- hist(Y_const_level1a_scaled[,'nbp_lnd_sum'], main = 'NBP', xlab = 'GtC/year', col = makeTransparent(wave00col,150))
hist(trunc(Y_const_wave01_scaled [,'nbp_lnd_sum'], h$breaks) ,
     col = makeTransparent(wave01col,150) , breaks = h$breaks, add = TRUE)

rug(Y_const_stan_scaled['nbp_lnd_sum'], lwd = 2)

polygon(x = c(0, 100, 100, 0), y = c(0, 0, 1000, 1000),
        col = makeTransparent(rangecol, 60),
        border = makeTransparent(rangecol))

h <- hist(Y_const_level1a_scaled[,'npp_nlim_lnd_sum'],col = makeTransparent(wave00col,150), main = 'NPP', xlab = 'GtC/year')
hist(trunc(Y_const_wave01_scaled [,'npp_nlim_lnd_sum'], h$breaks) , 
     col = makeTransparent(wave01col) , breaks = h$breaks, add = TRUE)

rug(Y_const_stan_scaled['npp_nlim_lnd_sum'], lwd = 2)

polygon(x = c(35, 80, 80, 35), y = c(0, 0, 1000, 1000),
        col = makeTransparent(rangecol, 60),
        border = makeTransparent(rangecol))


h <- hist(Y_const_level1a_scaled[,'cSoil_lnd_sum'], col = makeTransparent(wave00col,150), main = 'Soil Carbon', xlab = 'GtC')
hist(trunc(Y_const_wave01_scaled [,'cSoil_lnd_sum'], h$breaks) , 
     col = makeTransparent(wave01col,150) , breaks = h$breaks, add = TRUE)

rug(Y_const_stan_scaled['cSoil_lnd_sum'], lwd = 2)

polygon(x = c(750, 3000, 3000, 750), y = c(0, 0, 1000, 1000),
        col = makeTransparent(rangecol, 60),
        border = makeTransparent(rangecol))

h <- hist(Y_const_level1a_scaled[,'cVeg_lnd_sum'], col = makeTransparent(wave00col,150), main = 'Vegetation Carbon', xlab = 'GtC')
hist(trunc(Y_const_wave01_scaled [,'cVeg_lnd_sum'], h$breaks) , 
   col = makeTransparent(wave01col,150)  , breaks = h$breaks, add = TRUE)

rug(Y_const_stan_scaled['cVeg_lnd_sum'], lwd = 2)

polygon(x = c(300, 800, 800, 300), y = c(0, 0, 1000, 1000),
        col = makeTransparent(rangecol, 60),
       border =  makeTransparent(rangecol))



reset()

legend('top', horiz = TRUE, fill = c(makeTransparent(wave00col, 150), makeTransparent(wave01col, 150), makeTransparent(rangecol, 60)), legend = c('Wave00', 'Wave01', 'AW range'))

#dev.off()
```

## What proportion of wave01 fall within Andy Wiltshire's constraints?

Just under a third. Points at a significant model discrepency in cVeg

Of the 400 members of the wave01 ensemble, 128 pass Andy Wiltshire's Level 2 constraints.

```{r}

length(level2_ix_wave01)
length(level2_ix_wave01) / ntrain_wave01


```



```{r plot-carbon-cycle-timeseries-primary, fig.width = 10, fig.height = 12}

lcol_wave0 <- makeTransparent('dodgerblue2',  120)
lcol_wave01 <- makeTransparent('firebrick',  120)
lcol_wave01_level2 <- 'gold'
stancol = 'black'

linePlotMultiEns <- function(years, ens1, ens2, ens3, col1, col2, col3, ylab, main, ylim = NULL, ...){
  # Plot wave00 and wave01 timeseries on top of one another
  
  nt <- length(years) 
  if(is.null(ylim)){
    
  ylim = range(c(ens1[,1], ens1[,nt], ens2[,1], ens2[ ,nt], ens3[,1], ens3[, nt]))
  }
  
  else ylim <- ylim
  
  matplot(years, t(ens1), type = 'l', lty = 'solid',ylim = ylim, col = col1,
        ylab = ylab, main = main, xlab = '',
        bty = 'n', ...)
  matlines(years, t(ens2), col = col2, lty = 'solid')
    matlines(years, t(ens3), col = col3, lty = 'solid')
}

pdf(file = 'figs/fig03.pdf', width = 10, height = 12)
#pdf(file = 'figs/carbon-cycle-timeseries-waves-constrained.pdf', width = 10, height = 12)
par(mfrow= c(3,5), las = 1, mar = c(4,4,1,0))

linePlotMultiEns(years = years, ens1 = npp_ens_wave00[without_outliers_ix_wave00,],
                 ens2 = npp_ens_wave01[without_outliers_ix_wave01,],
                 ens3 = npp_ens_wave01[level2a_ix_wave01, ],
                 col1 = lcol_wave0, col2 = lcol_wave01, col3 = lcol_wave01_level2,
                 ylab = 'GtC', main = 'NPP')

lines(years,npp_stan, col = stancol, lty = 'solid', lwd = 2)

linePlotMultiEns(years = years, ens1 =  nbp_ens_wave00[without_outliers_ix_wave00,], 
                 ens2 = nbp_ens_wave01[without_outliers_ix_wave01,],
                 ens3 = nbp_ens_wave01[level2a_ix_wave01, ],
                 col1 = lcol_wave0, col2 = lcol_wave01,col3 = lcol_wave01_level2,
                 ylab = 'GtC', main = 'NBP', ylim = c(-10,10))

lines(years, nbp_stan, col = stancol, lty = 'solid', lwd = 2)

linePlotMultiEns(years = years, ens1 = cSoil_ens_wave00[without_outliers_ix_wave00,],
                 ens2 = cSoil_ens_wave01[without_outliers_ix_wave01,],
                 ens3 = cSoil_ens_wave01[level2a_ix_wave01, ],
                 col1 = lcol_wave0, col2 = lcol_wave01, col3 = lcol_wave01_level2,
                 ylab = 'GtC', main = 'cSoil', ylim = range(c(cSoil_ens_wave00[,1], cSoil_ens_wave00[,164])))

lines(years, cSoil_stan, col = stancol, lty = 'solid', lwd = 2)

linePlotMultiEns(years = years, ens1 = cVeg_ens_wave00[without_outliers_ix_wave00,],
                 ens2 = cVeg_ens_wave01[without_outliers_ix_wave01,],
                 ens3 = cVeg_ens_wave01[level2a_ix_wave01, ],
                 col1 = lcol_wave0, col2 = lcol_wave01, col3 = lcol_wave01_level2,
                 ylab = 'GtC', main = 'cVeg')

lines(years, cVeg_stan, col = stancol, lty = 'solid', lwd = 2)

linePlotMultiEns(years = years, ens1 = lai_lnd_mean_ens_wave00[without_outliers_ix_wave00,],
                 ens2 = lai_lnd_mean_ens_wave01[without_outliers_ix_wave01,],
                 ens3 = lai_lnd_mean_ens_wave01[level2a_ix_wave01, ],
                 col1 = lcol_wave0, col2 = lcol_wave01, col3 = lcol_wave01_level2,
                 ylab = 'GtC', main = 'Lai')

lines(years, lai_lnd_mean_stan, col = stancol, lty = 'solid', lwd = 2)

linePlotMultiEns(years = years, ens1 = rh_lnd_sum_ens_wave00[without_outliers_ix_wave00,],
                 ens2 = rh_lnd_sum_ens_wave01[without_outliers_ix_wave01,],
                 ens3 = rh_lnd_sum_ens_wave01[level2a_ix_wave01, ],
                 col1 = lcol_wave0, col2 = lcol_wave01,  col3 = lcol_wave01_level2,
                 ylab = 'GtC', main = 'RH')

lines(years, rh_lnd_sum_stan, col = stancol, lty = 'solid', lwd = 2)

linePlotMultiEns(years = years, ens1 = fLuc_lnd_sum_ens_wave00[without_outliers_ix_wave00,],
                 ens2 = fLuc_lnd_sum_ens_wave01[without_outliers_ix_wave01,],
                 ens3 = fLuc_lnd_sum_ens_wave01[level2a_ix_wave01, ],
                 col1 = lcol_wave0, col2 = lcol_wave01, col3 = lcol_wave01_level2,
                 ylab = 'GtC', main = 'fLuc')

lines(years, fLuc_lnd_sum_stan, col = stancol, lty = 'solid', lwd = 2)

linePlotMultiEns(years = years, ens1 = fHarvest_lnd_sum_ens_wave00[without_outliers_ix_wave00,],
                 ens2 = fHarvest_lnd_sum_ens_wave01[without_outliers_ix_wave01,],
                 ens3 = fHarvest_lnd_sum_ens_wave01[level2a_ix_wave01, ],
                 col1 = lcol_wave0, col2 = lcol_wave01, col3 = lcol_wave01_level2,
                 ylab = 'GtC', main = 'fHarvest')

lines(years, fHarvest_lnd_sum_stan, col = stancol, lty = 'solid', lwd = 2)

linePlotMultiEns(years = years, ens1 = treeFrac_lnd_mean_ens_wave00[without_outliers_ix_wave00,],
                 ens2 = treeFrac_lnd_mean_ens_wave01[without_outliers_ix_wave01,],
                 ens3 = treeFrac_lnd_mean_ens_wave01[level2a_ix_wave01, ],
                 col1 = lcol_wave0, col2 = lcol_wave01, col3 = lcol_wave01_level2,
                 ylab = '%', main = 'treefrac'
                 )

lines(years, treeFrac_lnd_mean_stan, col = stancol, lty = 'solid', lwd = 2)

linePlotMultiEns(years = years, ens1 = shrubFrac_lnd_mean_ens_wave00[without_outliers_ix_wave00,],
                 ens2 = shrubFrac_lnd_mean_ens_wave01[without_outliers_ix_wave01,],
                 ens3 = shrubFrac_lnd_mean_ens_wave01[level2a_ix_wave01, ],
                 col1 = lcol_wave0, col2 = lcol_wave01, col3 = lcol_wave01_level2,
                 ylab = '%', main = 'shrubfrac'
)

lines(years, shrubFrac_lnd_mean_stan, col = stancol, lty = 'solid', lwd = 2)

linePlotMultiEns(years = years, ens1 = baresoilFrac_lnd_mean_ens_wave00[without_outliers_ix_wave00,],
                 ens2 = baresoilFrac_lnd_mean_ens_wave01[without_outliers_ix_wave01,],
                 ens3 = baresoilFrac_lnd_mean_ens_wave01[level2a_ix_wave01, ],
                 col1 = lcol_wave0, col2 = lcol_wave01, col3 = lcol_wave01_level2,
                 ylab = '%', main = 'baresoilfrac')

lines(years, baresoilFrac_lnd_mean_stan, col = stancol, lty = 'solid', lwd = 2)


linePlotMultiEns(years = years, c3PftFrac_lnd_mean_ens_wave00[without_outliers_ix_wave00,],
                 ens2 = c3PftFrac_lnd_mean_ens_wave01[without_outliers_ix_wave01,],
                 ens3 = c3PftFrac_lnd_mean_ens_wave01[level2a_ix_wave01, ],
                 col1 = lcol_wave0, col2 = lcol_wave01, col3 = lcol_wave01_level2,
                 ylab = '%', main = 'c3PftFrac')

lines(years, c3PftFrac_lnd_mean_stan, col = stancol, lty = 'solid', lwd = 2)


linePlotMultiEns(years = years, c4PftFrac_lnd_mean_ens_wave00[without_outliers_ix_wave00,],
                 ens2 = c4PftFrac_lnd_mean_ens_wave01[without_outliers_ix_wave01,],
                 ens3 = c4PftFrac_lnd_mean_ens_wave01[level2a_ix_wave01, ],
                 col1 = lcol_wave0, col2 = lcol_wave01, col3 = lcol_wave01_level2,
                 ylab = '%', main = 'c4PftFrac')

lines(years, c4PftFrac_lnd_mean_stan, col = stancol, lty = 'solid', lwd = 2)


reset()

legend('bottomright', legend = c('wave00','wave01','wave01 level2','standard'), lty = 'solid', lwd = 1.5, col = c(lcol_wave0, lcol_wave01, lcol_wave01_level2, stancol), inset = c(0.05, 0.15) )

dev.off()
```

```{r, fig.width = 10, fig.height = 12}

pdf(file = 'figs/fig04.pdf', width = 10, height = 12)
#pdf(file = 'figs/carbon-cycle-timeseries-anomaly-waves-constrained.pdf', width = 10, height = 12)
par(mfrow= c(3,5), las = 1, mar = c(4,4,1,0))

linePlotMultiEns(years = years, ens1 = npp_ens_anom_wave00[without_outliers_ix_wave00,],
                 ens2 = npp_ens_anom_wave01[without_outliers_ix_wave01,],
                 ens3 = npp_ens_anom_wave01[level2a_ix_wave01, ],
                 col1 = lcol_wave0, col2 = lcol_wave01, col3 = lcol_wave01_level2,
                 ylab = 'GtC', main = 'NPP')

lines(years,npp_stan_anom, col = stancol, lty = 'solid', lwd = 2)
linePlotMultiEns(years = years, ens1 =  nbp_ens_anom_wave00[without_outliers_ix_wave00,], 
                 ens2 = nbp_ens_anom_wave01[without_outliers_ix_wave01,],
                 ens3 = nbp_ens_anom_wave01[level2a_ix_wave01, ],
                 col1 = lcol_wave0, col2 = lcol_wave01,col3 = lcol_wave01_level2,
                 ylab = 'GtC', main = 'NBP', ylim = c(-10,10))

lines(years, nbp_stan_anom, col = stancol, lty = 'solid', lwd = 2)

linePlotMultiEns(years = years, ens1 = cSoil_ens_anom_wave00[without_outliers_ix_wave00,],
                 ens2 = cSoil_ens_anom_wave01[without_outliers_ix_wave01,],
                 ens3 = cSoil_ens_anom_wave01[level2a_ix_wave01, ],
                 col1 = lcol_wave0, col2 = lcol_wave01, col3 = lcol_wave01_level2,
                 ylab = 'GtC', main = 'cSoil', ylim = range(c(cSoil_ens_anom_wave00[,1], cSoil_ens_anom_wave00[,164])))

lines(years, cSoil_stan_anom, col = stancol, lty = 'solid', lwd = 2)

linePlotMultiEns(years = years, ens1 = cVeg_ens_anom_wave00[without_outliers_ix_wave00,],
                 ens2 = cVeg_ens_anom_wave01[without_outliers_ix_wave01,],
                 ens3 = cVeg_ens_anom_wave01[level2a_ix_wave01, ],
                 col1 = lcol_wave0, col2 = lcol_wave01, col3 = lcol_wave01_level2,
                 ylab = 'GtC', main = 'cVeg')

lines(years, cVeg_stan_anom, col = stancol, lty = 'solid', lwd = 2)

linePlotMultiEns(years = years, ens1 = lai_lnd_mean_ens_anom_wave00[without_outliers_ix_wave00,],
                 ens2 = lai_lnd_mean_ens_anom_wave01[without_outliers_ix_wave01,],
                 ens3 = lai_lnd_mean_ens_anom_wave01[level2a_ix_wave01, ],
                 col1 = lcol_wave0, col2 = lcol_wave01, col3 = lcol_wave01_level2,
                 ylab = 'GtC', main = 'Lai')

lines(years, lai_lnd_mean_stan_anom, col = stancol, lty = 'solid', lwd = 2)

linePlotMultiEns(years = years, ens1 = rh_lnd_sum_ens_anom_wave00[without_outliers_ix_wave00,],
                 ens2 = rh_lnd_sum_ens_anom_wave01[without_outliers_ix_wave01,],
                 ens3 = rh_lnd_sum_ens_anom_wave01[level2a_ix_wave01, ],
                 col1 = lcol_wave0, col2 = lcol_wave01,  col3 = lcol_wave01_level2,
                 ylab = 'GtC', main = 'RH')

lines(years, rh_lnd_sum_stan_anom, col = stancol, lty = 'solid', lwd = 2)

linePlotMultiEns(years = years, ens1 = fLuc_lnd_sum_ens_anom_wave00[without_outliers_ix_wave00,],
                 ens2 = fLuc_lnd_sum_ens_anom_wave01[without_outliers_ix_wave01,],
                 ens3 = fLuc_lnd_sum_ens_anom_wave01[level2a_ix_wave01, ],
                 col1 = lcol_wave0, col2 = lcol_wave01, col3 = lcol_wave01_level2,
                 ylab = 'GtC', main = 'fLuc')

lines(years, fLuc_lnd_sum_stan_anom, col = stancol, lty = 'solid', lwd = 2)
 
linePlotMultiEns(years = years, ens1 = fHarvest_lnd_sum_ens_anom_wave00[without_outliers_ix_wave00,],
                 ens2 = fHarvest_lnd_sum_ens_anom_wave01[without_outliers_ix_wave01,],
                 ens3 = fHarvest_lnd_sum_ens_anom_wave01[level2a_ix_wave01, ],
                 col1 = lcol_wave0, col2 = lcol_wave01, col3 = lcol_wave01_level2,
                 ylab = 'GtC', main = 'fHarvest')

lines(years, fHarvest_lnd_sum_stan_anom, col = stancol, lty = 'solid', lwd = 2)

linePlotMultiEns(years = years, ens1 = treeFrac_lnd_mean_ens_anom_wave00[without_outliers_ix_wave00,],
                 ens2 = treeFrac_lnd_mean_ens_anom_wave01[without_outliers_ix_wave01,],
                 ens3 = treeFrac_lnd_mean_ens_anom_wave01[level2a_ix_wave01, ],
                 col1 = lcol_wave0, col2 = lcol_wave01, col3 = lcol_wave01_level2,
                 ylab = '%', main = 'treefrac'
                 )

lines(years, treeFrac_lnd_mean_stan_anom, col = stancol, lty = 'solid', lwd = 2)

linePlotMultiEns(years = years, ens1 = shrubFrac_lnd_mean_ens_anom_wave00[without_outliers_ix_wave00,],
                 ens2 = shrubFrac_lnd_mean_ens_anom_wave01[without_outliers_ix_wave01,],
                 ens3 = shrubFrac_lnd_mean_ens_anom_wave01[level2a_ix_wave01, ],
                 col1 = lcol_wave0, col2 = lcol_wave01, col3 = lcol_wave01_level2,
                 ylab = '%', main = 'shrubfrac'
)

lines(years, shrubFrac_lnd_mean_stan_anom, col = stancol, lty = 'solid', lwd = 2)

linePlotMultiEns(years = years, ens1 = baresoilFrac_lnd_mean_ens_anom_wave00[without_outliers_ix_wave00,],
                 ens2 = baresoilFrac_lnd_mean_ens_anom_wave01[without_outliers_ix_wave01,],
                 ens3 = baresoilFrac_lnd_mean_ens_anom_wave01[level2a_ix_wave01, ],
                 col1 = lcol_wave0, col2 = lcol_wave01, col3 = lcol_wave01_level2,
                 ylab = '%', main = 'baresoilfrac')

lines(years, baresoilFrac_lnd_mean_stan_anom, col = stancol, lty = 'solid', lwd = 2)


linePlotMultiEns(years = years, c3PftFrac_lnd_mean_ens_anom_wave00[without_outliers_ix_wave00,],
                 ens2 = c3PftFrac_lnd_mean_ens_anom_wave01[without_outliers_ix_wave01,],
                 ens3 = c3PftFrac_lnd_mean_ens_anom_wave01[level2a_ix_wave01, ],
                 col1 = lcol_wave0, col2 = lcol_wave01, col3 = lcol_wave01_level2,
                 ylab = '%', main = 'c3PftFrac')

lines(years, c3PftFrac_lnd_mean_stan_anom, col = stancol, lty = 'solid', lwd = 2)


linePlotMultiEns(years = years, c4PftFrac_lnd_mean_ens_anom_wave00[without_outliers_ix_wave00,],
                 ens2 = c4PftFrac_lnd_mean_ens_anom_wave01[without_outliers_ix_wave01,],
                 ens3 = c4PftFrac_lnd_mean_ens_anom_wave01[level2a_ix_wave01, ],
                 col1 = lcol_wave0, col2 = lcol_wave01, col3 = lcol_wave01_level2,
                 ylab = '%', main = 'c4PftFrac')

lines(years, c4PftFrac_lnd_mean_stan_anom, col = stancol, lty = 'solid', lwd = 2)

reset()

legend('bottomright', legend = c('wave00','wave01','wave01 level2','standard'), lty = 'solid', lwd = 1.5, col = c(lcol_wave0, lcol_wave01, lcol_wave01_level2, stancol), inset = c(0.05, 0.15) )


dev.off()

``` 

## Exploring summary of constraint: Cumulative NBP


### Carbon budget data

[Section 2.5 in Friedlingstein et al.](https://essd.copernicus.org/articles/12/3269/2020/#section2) describes how the land carbon sink is estimated.
```{r, historical-carbon-budget, warning = FALSE, fig.width = 10, fig.height = 10}

#pdf(file = 'carbon_budget.pdf', width = 10, height = 8)
# Question: How closely should our model match this curve? Which output?
# (My guess is 'Total Land Carbon anomaly')
historical_carbon_budget <- read_excel('data/Global_Carbon_Budget_2020v1.0.xlsx', sheet = "Historical Budget", skip = 15, n_max = 270)

par(mfrow = c(3,2))
ylim = c(-1, 6)


land_sink_net <- historical_carbon_budget$`land sink` - historical_carbon_budget$`land-use change emissions`

plot(historical_carbon_budget$Year, historical_carbon_budget$`fossil emissions excluding carbonation`, main = 'fossil emissions excluding carbonation', ylab = '',
     type = 'l', bty = 'n', ylim = ylim)

plot(historical_carbon_budget$Year, historical_carbon_budget$`land sink`, main = 'land sink', ylab = '', type = 'l', bty = 'n', ylim = ylim)

lines(historical_carbon_budget$Year, land_sink_net, col = 'red')

legend('topleft', legend = 'land sink - land use change emissions', col = 'red', lty = 'solid')

plot(historical_carbon_budget$Year, historical_carbon_budget$`land-use change emissions`, main = 'land use change emissions',
     ylab = '', type = 'l', bty = 'n', ylim = ylim)

plot(historical_carbon_budget$Year, historical_carbon_budget$`atmospheric growth`, main = 'atmospheric growth', ylab = '',
     type = 'l', bty = 'n', ylim = ylim)


plot(historical_carbon_budget$Year, historical_carbon_budget$`ocean sink`, main = 'ocean sink', type = 'l', bty = 'n', ylim = ylim)

plot(historical_carbon_budget$Year, historical_carbon_budget$`budget imbalance`, main = 'budget imbalance', ylab = '', type = 'l', bty = 'n', ylim = ylim)

#dev.off()
```
IPCC AR6 Chapter 5 states: "The net land carbon sink is taken as net biome productivity (NBP) and so includes any modelled net land-use 
change emissions. Further details on data sources and processing are available in the chapter data table (Table 5.SM.6)."

The GCP says: 
"The land sink is the average of several dynamic global vegetation models that reproduce the observed mean total land sink of the 1990s."

AR6 Chapter 5 states:  "The land carbon cycle components of historical ESM simulations show a larger range, with simulated cumulative land carbon uptake (1850–2014) spanning the range from –47 to +21 GtC, compared to the GCP estimate of –12 ± 50 GtC (Figure 5.23b).
```{r, fig.width=7, fig.height = 7}


plot(historical_carbon_budget$Year, historical_carbon_budget$`land sink`,
     main = 'land sink', ylab = '', type = 'l', bty = 'n', ylim = ylim)


plot(historical_carbon_budget$Year, cumsum(historical_carbon_budget$`land sink`),
     main = 'land sink', ylab = '', type = 'l', bty = 'n')


#plot(historical_carbon_budget$Year, cumsum(land_sink_net, na.rm = TRUE),
#     main = 'land sink', ylab = '', type = 'l', bty = 'n')

match_years_ix <- which(historical_carbon_budget$Year %in% 1850:2013)

match_years <- historical_carbon_budget$Year[match_years_ix]
cumulative_net_land_sink <- cumsum(land_sink_net[match_years_ix])


par(las = 1)
plot(match_years, cumulative_net_land_sink,
     type = 'l', main = "Cumulative Net Land Sink", xlab = "", ylab = "GtC",
     ylim = c(-80, 20))
```


```{r, fig.width = 5, fig.height = 8}

par(mar = c(8,4,4,1), las = 1)
cnbp_ens_wave00 <- t(apply(nbp_ens_wave00, 1, FUN = cumsum))
cnbp_ens_wave01 <- t(apply(nbp_ens_wave01, 1, FUN = cumsum))
cnbp_stan <- cumsum(nbp_stan)


linePlotMultiEns(years = years, ens1 =  cnbp_ens_wave00[without_outliers_ix_wave00,], 
                 ens2 = cnbp_ens_wave01[without_outliers_ix_wave01,],
                 ens3 = cnbp_ens_wave01[level2a_ix_wave01, ],
                 col1 = lcol_wave0, col2 = lcol_wave01,col3 = lcol_wave01_level2,
                 ylab = 'Cumulative NBP (GtC)', main = 'Cumulative NBP', xlim = c(1850, 2035))

lines(years, cnbp_stan, col = 'black', lwd = 2)

lines(match_years, cumulative_net_land_sink, lty = 'dashed', col = 'black')

arrows(2022,-47, 2022,21, angle = 90, length = 0.05, code = 3)

arrows(2035,(-12-50), 2035,(-12+50),  angle = 90, length = 0.05, code = 3)

text(2022, 30, 'AR6', cex = 0.8)

text(2035, 47, 'GCP', cex = 0.8)
reset()


legend('bottom', legend = c('wave00','wave01','wave01 level2','standard', 'GCP estimate'), lty = c('solid', 'solid', 'solid', 'solid', 'dashed'), lwd = 1.5, col = c(lcol_wave0, lcol_wave01, lcol_wave01_level2, stancol, stancol), horiz = FALSE, inset = 0.01, cex = 0.8)


```




# How much are all select output constrained?
```{r}

selected_tags <- c('npp', 'nbp', 'cSoil', 'cVeg', 'lai_lnd_mean', 'rh_lnd_sum', 'fLuc_lnd_sum', 'fHarvest_lnd_sum', 'treeFrac_lnd_mean', 'shrubFrac_lnd_mean', 'baresoilFrac_lnd_mean', 'c3PftFrac_lnd_mean', 'c4PftFrac_lnd_mean')

# wave00
selected_tags_vec_wave00 <- paste0(selected_tags, '_ens_wave00')

mv_means_wave00 <- matrix(ncol = length(selected_tags), nrow = nrow(npp_ens_wave00))
colnames(mv_means_wave00) <- selected_tags

for(i in 1:length(selected_tags_vec_wave00)){
  
  dat <- get(selected_tags_vec_wave00[i])
  
  colix <- 1:ncol(dat)
  trunc_dat <- dat[, tail(colix,20)]
  
  mean_dat <- apply(trunc_dat, 1, mean, na.rm = TRUE)
  mv_means_wave00[,i ] <- mean_dat
  
}

## wave01
selected_tags_vec_wave01 <- paste0(selected_tags, '_ens_wave01')
mv_means_wave01 <- matrix(ncol = length(selected_tags), nrow = nrow(npp_ens_wave01))
colnames(mv_means_wave01) <- selected_tags

for(i in 1:length(selected_tags_vec_wave01)){
  
  dat <- get(selected_tags_vec_wave01[i])
  
  colix <- 1:ncol(dat)
  trunc_dat <- dat[, tail(colix,20)]
  
  mean_dat <- apply(trunc_dat, 1, mean, na.rm = TRUE)
  mv_means_wave01[,i ] <- mean_dat
  
}


# wave00 anomaly
selected_tags_vec_anom_wave00 <- paste0(selected_tags, '_ens_anom_wave00')

mv_means_anom_wave00 <- matrix(ncol = length(selected_tags), nrow = nrow(npp_ens_anom_wave00))
colnames(mv_means_anom_wave00) <- selected_tags

for(i in 1:length(selected_tags_vec_anom_wave00)){
  
  dat <- get(selected_tags_vec_anom_wave00[i])
  
  colix <- 1:ncol(dat)
  trunc_dat <- dat[, tail(colix,20)]
  
  mean_dat <- apply(trunc_dat, 1, mean, na.rm = TRUE)
  mv_means_anom_wave00[,i ] <- mean_dat
  
}

# wave01 anomaly
selected_tags_vec_anom_wave01 <- paste0(selected_tags, '_ens_anom_wave01')

mv_means_anom_wave01 <- matrix(ncol = length(selected_tags), nrow = nrow(npp_ens_anom_wave01))
colnames(mv_means_anom_wave01) <- selected_tags

for(i in 1:length(selected_tags_vec_anom_wave01)){
  
  dat <- get(selected_tags_vec_anom_wave01[i])
  
  colix <- 1:ncol(dat)
  trunc_dat <- dat[, tail(colix,20)]
  
  mean_dat <- apply(trunc_dat, 1, mean, na.rm = TRUE)
  mv_means_anom_wave01[,i ] <- mean_dat
  
}

```

```{r}

range_wave00 <- apply(mv_means_wave00[without_outliers_ix_wave00,], 2 , range) # Wave00 sets the initial range
range_wave01 <- apply(mv_means_wave01[without_outliers_ix_wave01,], 2, range)
range_wave01_level2a <- apply(mv_means_wave01[level2a_ix_wave01, ], 2, range)

# What is the output range of the level2a modern value, expressed as a proportion of the initial ensemble?
range_prop_wave01_level2a <-  (apply(range_wave01_level2a, 2, diff) / apply(range_wave00, 2, diff)) *100


range_anom_wave00 <- apply(mv_means_anom_wave00[without_outliers_ix_wave00,], 2 , range) # Wave00 sets the initial range
range_anom_wave01 <- apply(mv_means_anom_wave01[without_outliers_ix_wave01,], 2, range)
range_anom_wave01_level2a <- apply(mv_means_anom_wave01[level2a_ix_wave01, ], 2, range)

# What is the output range of the level2a modern value, expressed as a proportion of the initial ensemble?
range_prop_anom_wave01_level2a <-  (apply(range_anom_wave01_level2a, 2, diff) / apply(range_anom_wave00, 2, diff)) *100


```

```{r}
all_waves_mv <- rbind(mv_means_wave00[without_outliers_ix_wave00,], mv_means_wave01[without_outliers_ix_wave01,])

range_all_waves <- apply(all_waves_mv, 2, range)

level2_ix_all_waves_mv <- which(all_waves_mv[,'nbp'] > 0 &
                                  all_waves_mv[,'npp'] > 35 &
                                  all_waves_mv[,'npp'] < 80 &
                                  all_waves_mv[,'cSoil'] > 750 &
                                  all_waves_mv[,'cSoil'] < 3000 &
                                  all_waves_mv[,'cVeg'] > 300 &
                                  all_waves_mv[,'cVeg'] < 800
)

all_waves_mv_level2 <- all_waves_mv[level2_ix_all_waves_mv, ]


all_waves_col <- rep(zred, nrow(all_waves_mv))

all_waves_level2_col <- all_waves_col
all_waves_level2_col[level2_ix_all_waves_mv] <- zblue

range_all_waves_level2 <- apply(all_waves_mv[level2_ix_all_waves_mv, ], 2, range)

range_prop_all_waves_level2 <-(apply(range_all_waves_level2, 2, diff) / apply(range_all_waves, 2, diff)) *100

# Anomaly
all_waves_anom_mv <- rbind(mv_means_anom_wave00[without_outliers_ix_wave00,], mv_means_anom_wave01[without_outliers_ix_wave01,])

range_all_waves_anom <- apply(all_waves_anom_mv, 2, range)

all_waves_anom_mv_level2 <- all_waves_anom_mv[level2_ix_all_waves_mv, ]

range_all_waves_anom_level2 <- apply(all_waves_anom_mv[level2_ix_all_waves_mv, ], 2, range)

range_prop_all_waves_anom_level2 <-(apply(range_all_waves_anom_level2, 2, diff) / apply(range_all_waves_anom, 2, diff)) *100




```

### Constraint of output at level 2
Proportion of the initial range of model output that is covered by level 2 constrained ensemble (%):
```{r}
print(round(range_prop_all_waves_level2,1))
                                
```



```{r, fig.width = 8, fig.height = 8}

pdf(file = "figs/induced_constraint_barplot.pdf")
range_prop_round = round(range_prop_all_waves_level2,1)
range_prop_round_mat <- rbind(range_prop_round, (100 - range_prop_round))

range_prop_anom_round = round(range_prop_all_waves_anom_level2,1)
range_prop_anom_round_mat <- rbind(range_prop_anom_round, (100 - range_prop_anom_round))

par(las = 1, mar = c(8,1,3,2), mfrow = c(1,2), oma = c(1,10, 1,1))
b1 <- barplot(t(apply(range_prop_round_mat,1,rev)), 
        horiz = TRUE,
        col =  c( zblue, zred),
        xlab = 'Ensemble output range (%)',
        main = 'Absolute'
        )
text(0, rev(b1), range_prop_round, pos = 4, cex = 0.9)

b2 <- barplot(t(apply(range_prop_anom_round_mat,1,rev)), 
        horiz = TRUE,
        col =  c( zblue, zred),
        xlab = 'Ensemble output range (%)',
        names.arg = rep('', 13),
        main = 'Anomaly'
        )
text(0, rev(b2), range_prop_anom_round, pos = 4, cex = 0.9)

axis(1)
reset()
legend('bottom', legend = c('NROY', 'Ruled out'), fill = c(zblue, zred), horiz = TRUE, inset = 0.02 )
dev.off()


```

```{r, fig.width = 6, fig.height = 8}


axat = length(range_prop_round):1

par(las = 1, mar = c(5,10,3,2), xaxs = 'i')
plot(rev(range_prop_round), axat, xlim = c(0,100), axes = FALSE, type = 'n', ylab = '', xlab = 'Level 2 NROY proportion of whole range (%)')
#abline(h = axat, col = 'grey', lty = 'dashed')
#abline(v = 100)
points(rev(range_prop_anom_round), rev(axat), col = 'grey', pch = 19)
segments(x0 = 0, y0 = rev(axat), x1 = rev(range_prop_anom_round), y1 = rev(axat), col = 'grey', lty = 'dashed')


points(rev(range_prop_round), rev(axat), col = 'black', pch = 19)
segments(x0 = 0, y0 = rev(axat), x1 = rev(range_prop_round), y1 = rev(axat))
#segments(x0 = 0, y0 = rev(axat), x1 = rev(range_prop_anom_round), y1 = rev(axat), col = 'grey', lty = 'dashed')

axis(2, at  = axat, labels = selected_tags)
axis(1)
reset()
legend('top', c('Absolute', 'Anomaly'), pch = 19, col = c('black', 'grey'), horiz = TRUE, inset = 0.03)


```


### Pairs plot of 2d projections of constrained output space.

```{r, fig.width = 12, fig.height = 12}
#pdf(file = 'figs/output_pairs.pdf', width = 10, height = 10)
pairs(all_waves_mv, col = all_waves_level2_col, lower.panel = NULL, gap = 0,
      pch = 20,
      labels = 1:13
      )
reset()
legend('topleft', legend = c('NROY', 'Ruled out' ), pch = 20, col = c(zblue, zred), inset = c(0.03, 0.25))
legend('left', legend = c(paste(1:13, colnames(all_waves_mv))), inset = 0.03, cex = 1)
#dev.off()


```




## Constraining to level 2 with the emulator



```{r}
nunif <- 50000
X_unif <- samp_unif(nunif, mins = rep(0,32), maxes = rep(1, 32))
colnames(X_unif) <- colnames(X)
```



```{r}

# Can this go in common data? Would be needed for checking emulator fits
# Create fit lists for the combined data wave00 level 1a and wave01
Y_const_level1a_wave01_scaled_list <- mat2list(Y_const_level1a_wave01_scaled)

fit_list_const_level1a_wave01 <- mclapply(X = Y_const_level1a_wave01_scaled_list , FUN = km, formula = ~., design = X_level1a_wave01,
                                   mc.cores = 4, control = list(trace = FALSE))


```


```{r}

Y_const_level1a_wave01_scaled_pred <- multiPred(Y = Y_const_level1a_wave01_scaled, Xpred = X_unif, fit_list = fit_list_const_level1a_wave01)

```



```{r}


level2_ix_em_unif_wave00_wave01 <- which(Y_const_level1a_wave01_scaled_pred$pred_mean[,'nbp_lnd_sum'] > 0 &
                                           Y_const_level1a_wave01_scaled_pred$pred_mean[,'npp_nlim_lnd_sum'] > 35 & 
                                           Y_const_level1a_wave01_scaled_pred$pred_mean[,'npp_nlim_lnd_sum'] < 80 &
                                           Y_const_level1a_wave01_scaled_pred$pred_mean[,'cSoil_lnd_sum'] > 750 &
                                           Y_const_level1a_wave01_scaled_pred$pred_mean[,'cSoil_lnd_sum'] < 3000 &
                                           Y_const_level1a_wave01_scaled_pred$pred_mean[,'cVeg_lnd_sum'] > 300 & 
                                           Y_const_level1a_wave01_scaled_pred$pred_mean[,'cVeg_lnd_sum'] < 800
)


(length(level2_ix_em_unif_wave00_wave01) / nunif) * 100

```


```{r}
X_stan_norm <- normalize(matrix(rep(1, 32), nrow = 1), wrt = lhs)

colnames(X_unif) <- 1:32

```


```{r,fig.width = 12, fig.height = 12, warning = FALSE}

#pdf(file = 'figs/fig06.pdf', width = 12, height = 12)
#pdf(file = 'figs/pairs_level2_ix_em_unif_wave00_wave01.pdf', width = 12, height = 12)

par(oma = c(0,0,0,3), bg = 'white')

panel_hist_local <- function(x, ...)
{
    usr <- par("usr"); on.exit(par(usr))
    par(usr = c(usr[1:2], 0, 1.5) )
    h <- hist(x, plot = FALSE)
    breaks <- h$breaks; nB <- length(breaks)
    y <- h$counts; y <- y/max(y)
    rect(breaks[-nB], 0, breaks[-1], y, col = "cyan", ...)
}

pairs(rbind(X_unif[level2_ix_em_unif_wave00_wave01, ], X_stan_norm),
      gap = 0, lower.panel = NULL, xlim = c(0,1), ylim = c(0,1),
      panel = dfunc_up_truth,
      diag.panel = panel_hist_local,
      cex.labels = 1,
      col.axis = 'white',
      dfunc_col = rb
      )


image.plot(legend.only = TRUE,
           zlim = c(0,1),
           col = rb,
           legend.args = list(text = 'Density of model runs matching the criteria', side = 3, line = 1),
           legend.shrink = 0.6,
           horizontal = TRUE
)

legend('left', legend = paste(1:32, colnames(lhs)), cex = 1, bty = 'n')

#dev.off()
```




